# Exploring Semantic Clustering in Deep Reinforcement Learning for Video Games

This repository is the official implementation of [Exploring Semantic Clustering in Deep Reinforcement Learning for Video Games].

## Requirements

To install requirements:

```setup
pip install -r requirements.txt
```

>📋  To install Procgen, please follow the steps in [this repo](https://github.com/openai/procgen).

## Training

To train the skill model(s) of full distribution in the paper, run this command:

```
python -m train_procgen.train_sppo --env_name <ENV_NAME> --num_levels 0 --distribution_mode easy --timesteps_per_proc 25000000 --rand_seed <RAND_SEED>
```

To train the baseline model(s) of full distribution in the paper, run this command:

```
python -m train_procgen.train_ppo --env_name <ENV_NAME> --num_levels 0 --distribution_mode easy --timesteps_per_proc 25000000 --rand_seed <RAND_SEED>
```

>📋  All trained models and evaluation results would be saved in the folder ./train_procgen/checkpoints

## Figures and videos generation

>📋 We provide all evaluation files for all games. Thus, reviewers can generate the performance figures for these games by the following commands:

To generate the generalization figure for single game:

```
cd train_procgen
python single_graph.py --env_name <ENV_NAME>

Example:
python single_graph.py --env_name coinrun
```

To generate the generalization figure for all games:

```
cd train_procgen
python graph.py
```

To generate the figures of embedding space:

```
python -m train_procgen.enjoy_sppo --env_name <ENV_NAME> --mode 1
```

To generate the skill videos:

```
python -m train_procgen.enjoy_sppo --env_name <ENV_NAME> --mode 0
```

>📋  Bonus: We also uploaded the three checkpoints here to generate the videos in supplementary materials, so you can directly try to reproduce those interesting skill videos for Fruitbot, Jumper and Ninja without training.

```
python -m train_procgen.enjoy_sppo --env_name fruitbot --mode 0
python -m train_procgen.enjoy_sppo --env_name jumper --mode 0
python -m train_procgen.enjoy_sppo --env_name ninja --mode 0
```

>📋  Note that the video generation may take a while (usually 30-60 minutes depending on the specific machine performance), since it needs to ensure exploring enough frames for every clusters.

>📋  All generated figures and videos would be saved in the folder ./train_procgen/figures

## Hover clusters

To collect states and run the hover visualization (you need the corresponding checkpoint first, and we provided the checkpoints of FruitBot, Jumper and Ninja in the supplementary material):
```
python -m train_procgen.hover_clusters --env_name <ENV_NAME>
```
The running time is about 10 to 20 minutes. 
Example:
```
python -m train_procgen.hover_clusters --env_name fruitbot
```
Enjoy!